Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting

نویسندگان

Nir Friedman

Moisés Goldszmidt

Thomas J. Lee

چکیده

In a recent paper, Friedman, Geiger, and Goldszmidt [8] introduced a classifier based on Bayesian networks, called Tree Augmented Naive Bayes (TAN), that outperforms naive Bayes and performs competitively with C4.5 and other state-of-the-art methods. This classifier has several advantages including robustness and polynomial computational complexity. One limitation of the TAN classifier is that it applies only to discrete attributes, and thus, continuous attributes must be prediscretized. In this paper, we extend TAN to deal with continuous attributes directly via parametric (e.g., Gaussians) and semiparametric (e.g., mixture ofGaussians) conditional probabilities. The result is a classifier that can represent and combine both discrete and continuous attributes. In addition, we propose a new method that takes advantage of the modeling language of Bayesian networks in order to represent attributes both in discrete and continuous form simultaneously, and use both versions in the classification. This automates the process of deciding which form of the attribute is most relevant to the classification task. It also avoids the commitment to either a discretized or a (semi)parametric form, since different attributes may correlate better with one version or the other. Our empirical results show that this latter method usually achieves classification performance that is as good as or better than either the purely discrete or the purely continuous TAN models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discretizing Continuous Attributes Using Information Theory

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. The amount of information each interval gives to the target attribute ...

متن کامل

Bayesian network classifiers which perform well with continuous attributes: Flexible classifiers

When modelling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous works have solved the problem by discretizing them with the consequent loss of information. Another common alternative assumes that the data are generated by a Gaussian distribution (parametric approach), such as conditional Gaussian networks, wit...

متن کامل

Impact of Patients’ Gender on Parkinson’s disease using Classification Algorithms

In this paper the accuracy of two machine learning algorithms including SVM and Bayesian Network are investigated as two important algorithms in diagnosis of Parkinson’s disease. We use Parkinson's disease data in the University of California, Irvine (UCI). In order to optimize the SVM algorithm, different kernel functions and C parameters have been used and our results show that SVM with C par...

متن کامل

A Hellinger-based discretization method for numeric attributes in classification learning

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. Our method is context-sensitive in the sense that it takes into accoun...

متن کامل

Discretizing Continuous Attributes While Learning Bayesian Networks

We introduce a method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process. The main ingredient in this method is a new metric based on the Minimal Description Length principle for choosing the threshold values for the discretization while learning the Bayesian network structure. This score balances the complexity of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting

نویسندگان

چکیده

منابع مشابه

Discretizing Continuous Attributes Using Information Theory

Bayesian network classifiers which perform well with continuous attributes: Flexible classifiers

Impact of Patients’ Gender on Parkinson’s disease using Classification Algorithms

A Hellinger-based discretization method for numeric attributes in classification learning

Discretizing Continuous Attributes While Learning Bayesian Networks

عنوان ژورنال:

اشتراک گذاری